import asyncio
import os
import albumentations as A
import cv2
import random
import numpy as np
import math
# Self written class to manage async logging
from async_logger_manager import ( get_logger, AsyncLoggerManager )Understanding the Data
In my previous project I scraped this website for my data. That whole process took around 14 hours and going through that process on top of downloading all the images would have taken far longer than I wanted. Prior to my last project, while I was doing research on pokemon card data, I found this API service. They have a github repository with all the pokemon card data (including image download links) in JSON format.
Finding that gihub repository was a life saver and, perhaps, cut development time by half. From the data they provided I was able to get all pokemon card data from the Scarlet and Violet series:
| Set ID | Set Name | Number of Cards | Release Date |
|---|---|---|---|
| svp | Scarlet & Violet Black Star Promos | 165 | 2023/01/01 |
| sve | Scarlet & Violet Energies | 16 | 2023/03/31 |
| sv1 | Scarlet & Violet | 258 | 2023/03/31 |
| sv2 | Paldea Evolved | 279 | 2023/06/09 |
| sv3 | Obsidian Flames | 230 | 2023/08/11 |
| sv4 | Paradox Rift | 266 | 2023/11/03 |
| sv5 | Temporal Forces | 218 | 2024/03/22 |
| sv6 | Twilight Masquerade | 226 | 2024/05/24 |
| sv7 | Stellar Crown | 175 | 2024/09/13 |
| sv8 | Surging Sparks | 252 | 2024/11/08 |
| sv9 | Journey Together | 190 | 2025/03/28 |
| sv10 | Destined Rivals | 244 | 2025/05/30 |
| sv3pt5 | 151 | 207 | 2023/09/22 |
| sv4pt5 | Paldean Fates | 245 | 2024/01/26 |
| sv6pt5 | Shrouded Fable | 99 | 2024/08/02 |
| sv8pt5 | Prismatic Evolutions | 180 | 2025/01/17 |
| zsv10pt5 | Black Bolt | 172 | 2025/07/18 |
| rsv10pt5 | White Flare | 173 | 2025/07/18 |
The total number of cards in the Scarlet and Violet series is 3595.
Augmenting the Data
I now have 3595 images which is 3595 classes that I am working with. We need more data for each class so I am going to augment each image 200 times using the python library albumentations and changing the background to images found in the Describable Textures Dataset (DTD) from the University of Oxford.
Lets walk through the code.
We first import all the libraries. The most important ones are cv2 and albumentations.
Setting Up Global Variables
Next we have our global variables. Pay attention to NUM_AUGMENTATIONS_PER_IMAGE, target_size, and image_size. We set NUM_AUGMENTATIONS_PER_IMAGE=200 because we want 200 different data points for each of our class. In total this gives us 719000 images to work with. I will talk more about this below but I realized later that this large dataset was not the most agile dataset to work with.
target_size=512 sets our final image size to 512x512 which is larger than perhaps recommended for a smaller model like the EfficientNetV2-B0 but I will talk about this later too.
image_size=416 sets our pokemon card image’s longest size to 416px this is because I wanted the pokemon card to be fully shown even with the augmentations that will include rotations and perspective changes.
DATA_PATH="./data/"
FILE_INPUT_PATH = os.path.join(DATA_PATH, "images/cards/scarlet_violet")
FILE_OUTPUT_PATH = os.path.join(DATA_PATH, "processed_images/cards/scarlet_violet/")
BACKGROUND_IMAGE_PATH = os.path.join("./data", "backgrounds/dtd/images/")
NUM_AUGMENTATIONS_PER_IMAGE = 200
target_size = 512
image_size = 416Defining the Transformations
Lets run through the transformations:
Affineis our standard rotations, scaling, and shearing. I wanted rotation to be noticeable but not enough to make the text on the card hard to read so I set it to-15degrees - 15degrees.Perspectiveis what it sounds like. I played around with the scale value but left it as default.RandomBrightnessContrastchanges the brightness and the contrast. The default values are good enough.RandomShadowplaces random dark shapes with low opacity on your image.MotionBlurblurs the image. I brought theblur_limitdown because it was too much.GaussNoiseputs a filter on the image drawn from gaussian distribution.ISONoisetries to replicate the camera ISO noise when taking a photo in low light. Way too intense most of the times.RGBShiftchanges the color of the image slightly by shifting the values in the Red, Green, and Blue channels.LongestMaxSizesets the longest size of our image to the value that we set it as. We talked about this earlier.
transform = A.Compose([
A.Affine(
rotate=(-15, 15),
fit_output=True,
p=0.8
),
A.Perspective(
fit_output=True,
p=0.7
),
A.RandomBrightnessContrast(p=0.7),
A.RandomShadow(p=0.4),
A.RandomSunFlare(p=0.2),
A.MotionBlur(blur_limit=(3, 5), p=0.3),
A.GaussNoise(p=0.3),
A.ISONoise(intensity=(0.05, 0.25), color_shift=(0.01, 0.03), p=0.3),
A.RGBShift(r_shift_limit=15, g_shift_limit=15, b_shift_limit=15, p=0.3),
A.LongestMaxSize(max_size=target_size),
])The Image Processing Pipeline
Let focus in on our main function by breaking it down into a few steps:
- First, I get
original_images(list of all images in the input directory) andbackground_files(all the directories in the background input directory) - Then we go into the main loop.
- (2.a) Here we read the image using
cv2and theIMREAD_UNCHANGEDflag to include the alpha channel. - (2.b) We then convert the image from
BGRAtoRGBA. This is becauseopencvreadsBGRAbutalbumentationsreadsRGBA. - (2.c) Because
kerasis able to get the classes from the directory, I seperate out thebase_nameto be used as the output path. Theset_codewas originally used to find the pokemon card’s specific set and their JSON data but I didn’t need it so I edited it out.
- The next loop is the actual image processing loop.
- (3.a) First I find a random background image from the
DTDdataset to be used as the background of the pokemon card. - (3.b) Then I seperate out the
RBGAchannels intoRGBandAchannels (basically seperating out the colors from the transparency layer) and apply the transformation to both. This way the mask has the same rotation, zoom, etc. as the actual image. It is important to note thatalbumentationskeeps these channels seperate which leads us to my next step. - (3.c) We seperate the image and the mask from the object that
albumentationsgave us and join them back together to clean the transparent areas.aug_alpha_mask == 0finds the pixels wherealpha=0and we set those pixels to transparent and black ([0,0,0,0]). Then we extract the newly cleaned and processed mask. - (3.d) The next part was the hardest part of the whole code. I tried multiple iterations of this and often times either the mask was broken or nothing appeared in the background. Basically, I want to now take the augmented image and place it onto my background.
- (3.d.1) The images included in the
DTDdataset were all of different sizes and, since I wanted my image size to be512x512I had to resize theDTDbackground image so I can crop it down to the512x512size. - (3.d.2) Then I find a random place on the background image to crop it down to a
512x512image from the resized background image. - (3.d.3) Then I find a random place on the background image to place the image into and find the inverse of the mask that we created in step (3.c) and that will be where the background image would be. Then I find the color portions of the augmented card.
- (3.d.4) I extract the region that the card will be placed from the background, find the background pixels of the image that will be where the card is transparent using the
mask_invin our last step, and keep the card pixels using the mask that we created. - (3.d.5) The final step is to combine all our previous steps. So we created the background, the background part of the card (where the card is transparent), and the card itself. We first combine the card and the background of the card and then combine that with the entire background.
- (3.d.6) Then we save our image.
- (3.d.1) The images included in the
async def main() -> None:
logger_manager = AsyncLoggerManager.instance()
logger_manager.init(log_file="scraping.log")
logger = logger_manager.get_logger("scrapingImages")
os.makedirs(FILE_OUTPUT_PATH, exist_ok=True)
# (1) Get list of images and background directories
original_images = [img for img in os.listdir(FILE_INPUT_PATH) if img.endswith(('.png', '.jpg', '.jpeg'))]
background_files = [os.path.join(BACKGROUND_IMAGE_PATH, f) for f in os.listdir(BACKGROUND_IMAGE_PATH)]
logger.info("Starting image processing...")
# (2) Main loop
for image_name in original_images:
# (2.a) Read image with alpha channel
image_path = os.path.join(FILE_INPUT_PATH, image_name)
image = cv2.imread(image_path, cv2.IMREAD_UNCHANGED)
# (2.b) Convert from BGRA to RGBA
image = cv2.cvtColor(image, cv2.COLOR_BGRA2RGBA)
if image is None:
logger.warning(f"Failed to read image: {image_path}")
continue
# (2.c) Extract base name and set code
base_name, ext = os.path.splitext(image_name)
set_code = base_name.split('-')[0]
logger.info(f"Base name: {base_name}")
logger.info(f"Set code: {set_code}")
output_image_path = os.path.join(FILE_OUTPUT_PATH, base_name)
os.makedirs(output_image_path, exist_ok=True)
# (3) Image procecessing loop
for i in range(NUM_AUGMENTATIONS_PER_IMAGE):
# (3.a) Select random background image
bg_set_path = random.choice(background_files)
background_image_files = [f for f in os.listdir(bg_set_path) if f.endswith(('.png', '.jpg', '.jpeg'))]
bg_image_path = os.path.join(bg_set_path, random.choice(background_image_files))
background = cv2.imread(bg_image_path)
if background is None:
logger.warning(f"Failed to read background image: {bg_image_path}")
continue
background = cv2.cvtColor(background, cv2.COLOR_BGR2RGB)
# (3.b) Separate RGBA channels and apply transformations
rgb_card = image[:, :, :3]
alpha_mask = image[:, :, 3]
augmented_data = transform(image=rgb_card, mask=alpha_mask)
# (3.c) Recombine and clean transparent areas
aug_card_rgb = augmented_data['image']
aug_alpha_mask = augmented_data['mask']
augmented_image = np.dstack((aug_card_rgb, aug_alpha_mask))
background_pixels = aug_alpha_mask == 0
augmented_image[background_pixels] = [0, 0, 0, 0]
mask = augmented_image[:, :, 3]
# (3.d) Place augmented image onto background
# (3.d.1) Resize background to ensure it is large enough for cropping
bg_h, bg_w, _ = background.shape
scale_factor = max(target_size / bg_h, target_size / bg_w)
new_w = math.ceil(bg_w * scale_factor)
new_h = math.ceil(bg_h * scale_factor)
bg_resized = cv2.resize(background, (new_w, new_h))
# This and the if statement below was added before I realized I should resize the width and height
# using math.ceil to ensure they are always at least target_size of 512x512
h, w, _ = bg_resized.shape
if h < target_size or w < target_size:
logger.warning(
f"Resized background is too small ({w}x{h}) to crop. Skipping this one."
)
continue
# (3.d.2) Randomly crop a 512x512 section from the resized background
y_start = random.randint(0, h - target_size)
x_start = random.randint(0, w - target_size)
bg_crop = bg_resized[y_start:y_start+target_size, x_start:x_start+target_size]
# (3.d.3) Randomly position the augmented image on the cropped background
card_h, card_w, _ = augmented_image.shape
y_offset = random.randint(0, max(0, target_size - card_h))
x_offset = random.randint(0, max(0, target_size - card_w))
mask_inv = cv2.bitwise_not(mask)
aug_card_rgb = augmented_image[:, :, 0:3]
# final_image = np.zeros((target_size, target_size, 3), dtype=np.uint8)
# (3.d.4) Seperate out the region from the background and the transparent areas of the card
roi = bg_crop[y_offset:y_offset+card_h, x_offset:x_offset+card_w]
bg_part = cv2.bitwise_and(roi, roi, mask=mask_inv)
card_part = cv2.bitwise_and(aug_card_rgb, aug_card_rgb, mask=mask)
# (3.d.5) Combine the two parts and place it back onto the background crop
combined_roi = cv2.add(bg_part, card_part)
bg_crop[y_offset:y_offset+card_h, x_offset:x_offset+card_w] = combined_roi
# (3.e) Save the final image
base_name, extension = os.path.splitext(image_name)
new_image_name = f"{base_name}_aug_{i+1}.jpg"
output_path = os.path.join(output_image_path, new_image_name)
cv2.imwrite(output_path, cv2.cvtColor(bg_crop, cv2.COLOR_RGB2BGR))
logger.info("Image processing completed")
logger_manager.stop()
if __name__ == "__main__":
import pathlib
import sys
assert sys.version_info >= (3, 10), "Script requires Python 3.10+."
here = pathlib.Path(__file__).parent
asyncio.run(main())Sample Images
These images are not the ones I will be using to train my model. I originally was going to use the first few sets to train my model but I do not have enough personal cards from that time period to be able to test the model properly.
base1-1_aug_8
base1-2_aug_9
base1-3_aug_7
Here are some examples of images I actually used in the model:
sv6-12_aug_198
rsv10pt5-152_aug_10
sv2-56_aug_103
In total, from my 3595 classes, I generated 719,000 images taking up about 110 GB of storage.